Workload scheduler with resource optimization factoring

ABSTRACT

A workload scheduler supporting an efficient distribution and balancing of the workload is proposed. The scheduler maintains ( 383 - 386 ) a profile for each job; the profile (build using statistics of previous executions of the job) defines an estimated usage of different resources of the system by the job. The scheduler tends to select ( 318 - 342 ) the jobs with complementary resource requirements (according to a combination of their attributes); this process can be carried out using either a heuristic approach ( 318 - 334 ) or an optimization approach ( 335 - 342 ). As a result, the jobs that will be submitted are very likely to consume different resources of the system; in this way, any contention for the different resources is strongly reduced.

TECHNICAL FIELD

The present invention relates to the data processing field, and morespecifically to a scheduling method.

BACKGROUND ART

Scheduling methods are commonly used in a data processing system tocontrol submission of different work units to be executed (for example,jobs in a batch processing). For this purpose, several types of workloadschedulers have been proposed in the last years to automate thesubmission of large quantities of jobs. An example of scheduler isdescribed in “End-to-End Scheduling with Tivoli Workload Scheduler 8.1”V. Gucer, S. Franke, F. Knudsen, M. A. Lowry—ISBN 0738425079.

A scheduler submits the jobs according to a predefined plan. The planestablishes a flow of execution of the jobs according to severalfactors; typically, the factors affecting the flow of execution includetemporal values (such as date, time, day of the week) and dependencies(such as completion of predecessor jobs or system resourceavailability).

The schedulers known in the art are very sophisticated in handlingtemporal and predecessor constraints. However, a very basic support isavailable for managing the problems relating to the availability of theresources that are used by the different jobs. Typically, mostschedulers are able to resolve simple dependencies, which condition thesubmission of the jobs to a particular resource or set of resources.Moreover, the schedulers help an operator select the jobs to besubmitted whenever their number exceeds a maximum allowable value(limiting the number of jobs that are running concurrently to avoidexcessive contention for the resources of the system). For example, theoperator can assign a weight to each job (representing a supposed impactof the job on the system performance); those weights are used by thescheduler to assign different priorities to the jobs to be submitted.

A drawback of the solutions described above is the lack of any efficientsupport for distributing and balancing a workload of the system. Indeed,the weights assigned to the jobs by the operator are very inaccurate innature; moreover, those weights do not take into account a history ofthe different jobs. In any case, the proposed approach is unable toprevent overloading specific resources of the system (for example, whenmore jobs very intensive on that resource are submitted at the sametime).

Document U.S. Pat. No. 6,591,262 discloses a system wherein thescheduler collaborates with a workload manager. The workload manager isa software component (included is an operating system), which managesthe resources that are allocated to the different running jobs. In theproposed system, the scheduler maintains a profile for each job; theprofile (build using statistics of previous executions of the job)defines an estimated usage of different resources of the system by thejob. Whenever the job is submitted for execution, the correspondingprofile is attached and passed to the workload manager. In this way, theworkload manager can optimize the allocation of the resources of thesystem to the different running jobs.

However, the solution described in the cited document only acts on thejobs that are already in execution. Therefore, the proposed technique isunable to prevent the submission of potentially competing jobs. In anycase, the advantageous effects of the devised collaborative scheme canonly be achieved in systems wherein the operating system includes aworkload manager, which has been adapted to receive the profiles fromthe scheduler.

SUMMARY OF THE INVENTION

It is an object of the present invention to provide a scheduling method,which supports an efficient distribution and balancing of the workloadof the system.

It is another object of the present invention to improve the usage ofthe different resources, in order to increase the throughput of thesystem.

It is yet another object of the present invention to avoid overloadingspecific resources of the system.

Moreover, it is an object of the present invention to prevent thesubmission of potentially competing jobs.

It is another object of the present invention to minimize resourcecontention by the jobs.

It is yet another object of the present invention to improve thedistribution and balancing of the workload in systems without anyworkload manager (or with a workload manager that is unable to receivethe profiles from the scheduler).

The accomplishment of these and other related objects is achieved by amethod of scheduling submission of work units for execution on a dataprocessing system, the method including the steps of: providing aplurality of attributes for each work unit, each attribute beingindicative of the usage of a corresponding resource of the system by thework unit, selecting a subset of the work units for optimizing the usageof each resource individually according to a corresponding combinationof the attributes, and submitting the selected work units.

The present invention also provides a computer program for performingthe method and a product storing the program. A corresponding structurefor implementing the method is also encompassed.

The novel features believed to be characteristic of this invention areset forth in the appended claims. The invention itself, however, as wellas these and other related objects and advantages thereof, will be bestunderstood by reference to the following detailed description to be readin conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a computer in which the method ofthe invention is applicable;

FIG. 2 depicts the main software components that can be used forpracticing the method;

FIGS. 3 a-3 c show a diagram describing the flow of activities relatingto an illustrative implementation of the method.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

With reference in particular to FIG. 1, a schematic block diagram of acomputer 100 (for example, a main frame) is shown. The computer 100 isformed by several units, which are connected in parallel to a system bus105. In detail, multiple microprocessors (μP) 110 control operation ofthe computer 100; a DRAM 115 (typically consisting of interleavedmodules) is directly used as a shared working memory by themicroprocessors 110, and a ROM 120 stores basic code for a bootstrap ofthe computer 100. Several peripheral units are clustered around a localbus 125 (by means of respective interfaces). Particularly, a mass memoryconsists of a hard-disk 130 and a driver 135 for reading CD-ROMs 140.Moreover, the computer 100 includes input devices 145 (for example, akeyboard and a mouse), and output devices 150 (for example, a monitorand a printer). A network Interface Card (NIC) 155 is used to connectthe computer 100 to a network. A bridge unit 160 interfaces the systembus 105 with the local bus 125. Each microprocessor 110 and the bridgeunit 160 can operate as master agents requesting an access to the systembus 105 for transmitting information. An arbiter 165 manages thegranting of the access with mutual exclusion to the system bus 105.

Similar considerations apply if the computer has a different structure(for example, with a single bus) or includes other units (for example,drivers for magnetic tapes). However, the concepts of the presentinvention are also applicable when the computer consists of amini-system, or when the computer is replaced with an equivalent dataprocessing system (such as a network of workstations).

Moving to FIG. 2, the main software components that can be used topractice the method of the invention are depicted. The information(programs and data) is typically stored on the hard-disk and loaded (atleast partially) into the working memory of the computer when theprograms are running, together with an operating system and otherapplication programs (not shown in the figure). The programs areinitially installed onto the hard disk from CD-ROM.

An operating system 202 provides a software platform for theabove-described computer, on top of which other programs can run.Particularly, a workload scheduler 203 is installed on the computer. Thescheduler 203 includes a controller 205 for managing execution of aseries of non-interactive jobs (typically during the evening); forexample, the jobs consist of payroll programs, cost analysisapplications, and the like. The controller 205 accesses a workloaddatabase 210, which stores information about the different jobs to beexecuted. For each job, the workload database 215 includes a descriptionof the corresponding steps, a planned time of execution, and anydependency from other jobs or resources of the system; moreover, theworkload database 210 stores a record indicating an estimated durationof the job.

A profile is also associated with each job in the workload database 210(or at least with the ones that are run regularly). The profile includesmultiple attributes of the job; each attribute is indicative of theusage of a corresponding resource of the computer, which is likely to berequired by the job during its execution. Preferably, an attribute ofthe profile represents an estimated processing power consumption. Adifferent attribute indicates an estimated (working) memory requirement.A further attribute specifies an estimated input/output activity.Typically, the different attributes are expressed as percentage values.The controller 205 transmits the description of each job to be executed(stored in the workload database 210) to a builder 215. The builder 215creates plans 220 for controlling a flow of execution of batches of jobsin a determined sequence; each plan 220 is built according to a desiredscheduling strategy (for example, to balance a load of the computer orto optimize its peak performance). The plan 220 is supplied, through thecontroller 205, to an executor 225.

As described in detail in the following, the executor 225 selects thejobs to be run according to the plan 220; the selected jobs are thensubmitted for execution to the operating system 202. The jobs arereceived by the operating system 202 via a job entry subsystem 235. Thejob entry subsystem 235 controls the running of a current instance ofeach submitted job (denoted with 240). Moreover, the job entry subsystem235 interfaces with a workload manager 245 (included in the operatingsystem 202 as well). The workload manager 245 monitors the running jobs240; the workload manager 245 allocates the appropriate resources of thecomputer to the different running jobs 240, in order to optimize loadbalancing and overall performance.

Once the current instance of a generic job terminates its execution(because all the operations have been completed or an error hasoccurred), feedback information is returned to the controller 205 viathe executor 225; the feedback information includes an actual start timeand an actual end time of the terminated instance of the job, a returncode specifying the result of the operations, and the like. Thecontroller 205 uses this information to calculate a duration of theterminated job, in order to predict how long the job should run in thefuture; the corresponding record indicating the estimated duration ofthe job is updated accordingly in the workload database 210.

At the same time, a reporting module 250 collects statistics about theterminated job; for example, the statistics include the consumption ofprocessing power (expressed in microprocessor time units, such asseconds), the memory usage (expressed in number of bytes), theinput/output activity (expressed in number of performed operations), andthe like. The information collected by the reporting module 250 islogged into a job statistics database 255. A profile updating module 260captures the statistics of the terminated job (before they are writtento the database 255). The module 260 uses these statistics to update thecorresponding profile of the job in the workload database 210. Inaddition or in alternative, a batch scanner 265 periodically imports allthe statistics (of the instances of the jobs that have been previouslyexecuted) from the database 255. The profile updating module 260 usesthese statistics for performing a bulk update of the profiles in theworkload database 210 on a regular basis (for example, every day).

Similar considerations apply if the programs and the corresponding dataare structured in another way, if different modules or functions aresupported, or if the programs are provided on equivalent computerreadable medium (such as one or more floppy-disks). Alternatively, thejobs are described in a different way in the workload database, theattributes are expressed with equivalent values, or the profiles of thejobs include other information; likewise, the scheduler receivesequivalent feedback information for each terminated job, the statisticsare collected in a different way, or the job statistics databaseincludes other information (for example, the number of consumed serviceunits, defined as an intelligent mix of various factors). In any case,the concepts of the present invention are also applicable when theprofiles are used by the workload manager to adjust the distribution ofthe resources that are allocated to the running jobs, or even when theoperating system does not include any workload manager. Moreover, theinvention is also suitable to be used for scheduling submission ofdifferent jobs, interactive tasks, or more generally any other workunit.

With reference now to FIGS. 3 a-3 c, the scheduler (when running on thecomputer) performs a method 300 that begins at the black start circle303 in the swim-lane of the controller. Continuing to block 306, adesired plan (for example, selected by an operator through a graphicaluser interface of the scheduler) is submitted to the executor.

The swim-lane of the executor includes two branches that are executed inparallel. A first branch consists of blocks 309-348, and a second branchconsists of blocks 350-365; the two branches joint at block 368.

Considering in particular the branch 309-348, the executor at block 309identifies the jobs that are eligible for submission (according to theirplanned time of execution and their dependencies). The process thenbranches at block 315 according to the mode of operation of thescheduler. If the scheduler is configured to operate in a heuristic modethe blocks 318-334 are executed, while if the scheduler is configured tooperate in an optimization mode the blocks 335-342 are executed; in bothcases, the flow of activity merges again at block 345.

When the scheduler operates in the heuristic mode (blocks 318-334), theexecutor at block 318 creates a list for each attribute taken intoconsideration (processing-usage, memory-usage and I/O-usage,respectively); each list orders the eligible jobs according to thecorresponding attributes (for example, in a decreasing order from themost resource-consuming job to the less resource-consuming job). A loop321-328 is then entered for establishing a priority order of theeligible jobs; for this purpose, a preference sequence is builtalternately extracting the first jobs and the last jobs, respectively,from the above-described lists. Particularly, if the eligible jobs havebeen extracted from the end of the lists during a previous iteration ofthe loop (decision block 321), the first eligible jobs of the lists areselected and inserted into the preference sequence at block 324 (thesame operation is also executed at a first iteration of the loop);conversely, if the eligible jobs have been extracted from the beginningof the lists during the previous iteration of the loop, the lasteligible jobs of the lists are selected and inserted into the preferencesequence at block 327. In both cases, the method then verifies at block328 whether all the eligible jobs have been extracted from the lists. Ifnot, the flow of activity returns to block 321 for repeating theoperations described above. Conversely, the loop ends and the preferencesequence so obtained is reduced at block 329, removing any duplicationof the available jobs after their first occurrences.

The process continues to block 330, wherein the executor retrieves theprofiles of the running jobs and the eligible jobs from the workloaddatabase. For each resource taken into consideration (processing power,working memory and input/output activity), a current usage is estimatedat block 331 summing the corresponding attributes of all the runningjobs. A test is then made at block 332 to verify whether a predefinedthreshold condition is still met should the first eligible job (in thepreference sequence) be submitted; for example, the threshold conditionspecifies a maximum allowable processing-usage, a maximum allowablememory-usage and a maximum allowable I/O-usage. If so (i.e., whether thecurrent usage of each resource with the addition of the correspondingattribute of the first eligible job does not exceed its maximum usage),the first eligible job is selected and removed from the preferencesequence at block 333. The process continues to block 334, wherein thecurrent processing-usage, the current memory-usage and the currentI/O-usage are updated accordingly (adding the corresponding attributesof the selected eligible job). The flow of activity then descends intoblock 345; the same block is reached from block 332 directly when thethreshold condition is not met or the preference sequence is empty(since all the eligible jobs have been selected).

On the other hand, when the scheduler operates in the optimization mode(blocks 335-342), a test is made at block 335 to determine whether thenumber of eligible jobs exceeds a maximum allowable value; the maximumvalue is defined so as to limit the number of jobs that are runningconcurrently (thereby avoiding excessive contention for the resources ofthe computer). If so, the blocks 336-342 are executed, and the processthen continues to block 345; conversely, the flow of activity descendsinto block 345 directly.

Considering now block 336 (number of eligible jobs higher than themaximum value), the executor retrieves the profiles of the eligible jobsfrom the workload database. An objective function modeling adistribution of the usage of the different resources is defined; thecombination of the attributes of the eligible jobs that optimizes theobjective function (among all the possible combinations) is thenselected. In detail, for each combination (starting from a first one)the executor at block 337 calculates a parameter representing a totalusage of each resource (by summing the corresponding attributes of allthe eligible jobs of the combination). The process continues to block339, wherein a discontinuance factor is determined summing thedifferences (in absolute value) between each pair of those totalresource-usage parameters. A test is then made at block 340 to verifywhether a last combination has been processed. If not, the methodreturns to block 337 for repeating the same operations for a nextcombination. Conversely, the eligible jobs of the combination thatexhibits the lowest discontinuance factor are selected at block 342.

Considering now block 345, the selected jobs are submitted forexecution. The executor then verifies at block 348 whether all the jobsof the plan have been submitted. If not, the flow of activity returns toblock 309 for repeating the operations described above on the jobs ofthe plan still to be submitted. Conversely, the execution of the branchends at block 368.

At the same time, in the other branch 350-365 the executor is in awaiting condition at block 350. As soon as a generic job terminates itsexecution, the corresponding feedback information is returned to thecontroller at block 353. In response thereto, the controller at block359 calculates the duration of the terminated job by subtracting itsstart time from its end time. Continuing to block 362, the controlleruses the value so calculated to update the estimated duration of thejob; for example, the estimated duration is determined as a runningaverage of the values that have been measured for completed instances ofthe job (preferably filtering very different values as anomalies).

Returning to the swim-lane of the executor, a test is made at block 365to determine whether all the jobs of the plan have been terminated. Ifnot, the flow of activity returns to block 350 waiting for thetermination of a further job. Conversely, the execution of the branchends at block 368.

Concurrently, the termination of the job also triggers the collection ofthe corresponding statistics by the reporting module at block 377.Proceeding to block 380, the collected information is logged into thejob statistics database. The statistics are also captured by the profileupdating module at block 383 (in the respective swim-lane). Thesestatistics are then used at block 386 to update the profile of the job.For example, each attribute of the job is updated to a correspondingrunning average of the values that have been measured for completedinstances of the job; preferably, the profile updating module can betuned with user-adjustable parameters that define a smoothing factor andan anomaly identifying limit (which are used to discard very differentvalues).

Referring back to the swim-lane of the executor, the two branchesdescribed above joint at block 368 and the flow of activity returns tothe controller. In response thereto, the controller at block 389 logs aresult of the execution of the plan. The process then ends at theconcentric white/black stop circles 392.

For example, let us consider 5 eligible jobs {J₁, J₂, J₃, J₄, J₅}; theprofile of each eligible job is defined by the processing-usageattribute (denoted with P_(i), i=1 . . . 5), the memory-usage attribute(denoted with M_(i)) and the I/O-usage attribute (denoted with I_(i)):

-   -   J₁={P₁=10, M₁=2, I₁=3}    -   J₂={P₂=5, M₂=3, I₂=6}    -   J₃={P₃=1, M₃=10, I₃=5}    -   J₄={P₄=2, M₄=12, I₄=1}    -   J₅={P₅=3, M₅=1, I₅=2}        When the scheduler operates in the heuristic mode, the lists for        the processing-usage attributes, for the memory-usage attributes        and for the I/O-usage attributes are:    -   Processing-usage list={J₁=10, J₂=5, J₅=3, J₄=2, J₃=1}    -   Memory-usage list={J₄=12, J₃=10, J₂=3, J₁=2, J₅=1}    -   I/O-usage list={J₂=6, J₃=5, J₁=3, J₅=2, J₄=1}        As a consequence, the preference sequence is created as follows:    -   {J₁, J₄, J₂, J₃, J₅, J₄, J₂, J₃, J₃, J₄, J₁, J₂, J₅, J₂, J₃}        and is then reduced to:    -   {J₁, J₄, J₂, J₃, J₅}

The algorithm requires the executor to estimate a current usage of eachresource (according to the attributes of the running jobs); for example,the current processing-usage (denoted with Cp) is 32, the currentmemory-usage (denoted with Cm) is 38 and the current I/O-usage (denotedwith Ci) is 57. Let us assume that the threshold condition specifiesthat the processing-usage must be lower than 55, the memory-usage mustbe lower than 60 and the I/O-usage must be lower than 75:

-   -   (Cp<50) AND (Cm<60) AND (Ci<70)        In this situation, the threshold condition is still met if the        first eligible job in the preference sequence (J₁) is submitted;        indeed, the current usage of each resource with the addition of        the corresponding attribute of the eligible job J₁ does not        exceed the respective maximum usage:        Cp=32+10=42<50        Cm=38+2=40<60        Ci=57+3=60<70        Therefore, the eligible job J₁ is selected and removed from the        preference sequence; at the same time, the current usage of each        resource is updated adding the corresponding attribute of the        selected job J₁ (Cp=42, Cm=40 and Ci=60). Likewise, the        threshold condition is still met if the new first eligible job        in the preference sequence (J₄) is submitted:        Cp=42+2=44<55        Cm=40+12=52<60        Ci=60+1=61<75        Therefore, the eligible job J₄ is selected and the current usage        of each resource is updated accordingly (Cp=44, Cm=52 and        Ci=61). The same operations are repeated for the next eligible        job J₂ (with Cp=49, Cm=55 and Ci=67). In this situation,        however, the threshold condition is not met any longer if the        further eligible job (J₃) is submitted, since:        Cp=49+1=50<55        Cm=55+10=65 not <60        Ci=67+5=72<75        Therefore, the executor selects the eligible jobs J₁, J₄ and J₂        for execution.

The above-described algorithm combines the most resource-intensive jobswith the less resource-intensive jobs (for each resource); therefore,this method allows selecting the eligible jobs that use differentresources with a good approximation. Preferably, the selection startsfrom the most resource-intensive jobs. In this way, each resource isallocated to the heavy jobs as far as possible; the lessresource-intensive jobs can then be used to exploit any residualavailability of the resources.

On the other hand, when the scheduler operates in the optimization mode,the possible combinations of the eligible jobs are ${\begin{pmatrix}5 \\3\end{pmatrix} = {\frac{5!}{{3!}{( {5 - 3} )!}} = 10}},$that is:

-   -   {J₁, J₂, J₃}    -   {J₁, J₂, J₄}    -   {J₁, J₂, J₅}    -   {J₁, J₃, J₄}    -   {J₁, J₃, J₅}    -   {J₁, J₄, J₅}    -   {J₂, J₃, J₄}    -   {J₂, J₃, J₅}    -   {J₂, J₄, J₅}    -   {J₃, J₄, J₅}        For each combination C_(j) (with j=1 . . . 10), the total        processing-usage parameter (denoted with Tp_(j)), the total        memory-usage parameter (denoted with Tm_(j)), and the total        I/O-usage parameter (denoted with Ti_(j)) are defined by the        following formulas (wherein each summation relates to the        attributes of the corresponding eligible jobs):    -   Tp_(j)=ΣP    -   Tm_(j)=ΣM    -   Ti_(j)=ΣI        In the example at issue, we have:    -   C₁={J₁, J₂, J₃}: Tp₁=16, Tm₁=15, Ti₁=14    -   C₂={J₁, J₂, J₄}: TP₂=17, Tm₂=17, Ti₂=10    -   C₃={J₁, J₂, J₅}: Tp₃=18, Tm₃=6, Ti₃=11    -   C₄={J₁, J₃, J₄}: Tp₄=13, Tm₄=24, Ti₄=9    -   C₅={J₁, J₃, J₅}: Tp₅=14, Tm₅=13, Ti₅=10    -   C₆={J₁, J₄, J₅}: Tp₆=15, Tm₆=15, Ti₆=6    -   C₇={J₂, J₃, J₄}: Tp₇=8, Tm₇=25, Ti₇=12    -   C₈={J₂, J₃, J₅}: Tp₈=9, Tm₈=14, Ti₈=13    -   C₉={J₂, J₄, J₅}: Tp₉=10, Tm₉=16, Ti₉=9    -   C₁₀={J₃, J₄, J₅}: Tp₁₀=6, Tm₁₀=23, Ti₁₀=8        The discontinuance factor (denoted with DF_(j)) for each        combination C_(j) is obtained applying the formula:        DF _(j) =|Tp _(j) −Tm _(j) |+|Tp _(j) −Ti _(j) |+|Tm _(j) −Ti        _(j)|        Therefore, we have:    -   C₁={J₁, J₂, J₃}: DF₁=4    -   C₂={J₁, J₂, J₄}: DF₂=14    -   C₃={J₁, J₂, J₅}: DF₃=24    -   C₄={J₁, J₃, J₄}: DF₄=30    -   C₅={J₁, J₃, J₅}: DF₅=8    -   C₆={J₁, J₄, J₅}: DF₆=18    -   C₇={J₂, J₃, J₄}: DF₇=34    -   C₈={J₂, J₃, J₅}: DF₈=10    -   C₉={J₂, J₄, J₅}: DF₉=14    -   C₁₀={J₃, J₄, J₁₀}: DF₁₀=34        The best combination of eligible jobs (identified by the minimum        discontinuance factor DF₁=4) is then C₁={J₁, J₂, J₃}.

In this case, the consumption of the resources of the computer isuniformly distributed (as far as possible) among the differentresources.

In both modes of operations intelligence is added to the scheduler,which tends to select jobs with complementary resource requirements. Asa result, the jobs that will be submitted are very likely to consumedifferent resources of the system. For example, the scheduler can selecta job that is very intensive on the processing power together with otherjobs having low processing power requirements. In this way, the selectedjobs should not compete for the processing power of the system. However,the jobs with low processing power requirements can be very intensive onother resources of the system (such as the memory). In this way, theoverall performance of the system is strongly increased (since the usageof each resource is optimized individually).

Similar considerations apply if an equivalent method is performed, or ifsome functions are executed by different modules. In any case, theconcepts of the present invention are also applicable when the profilesare determined by a module embedded in the scheduler itself, when theattributes of the jobs are updated applying other algorithms, or whenthe scheduler supports different modes of operations (down to a singleone). Alternatively, the threshold condition is defined only taking intoaccount some of the resources, the eligible jobs are selected preferringthe less resource-intensive ones, or the maximum value is calculateddynamically.

More generally, the present invention proposes a method of schedulingsubmission of work units for execution on a data processing system. Forthis purpose, a plurality of attributes is provided for each work unit;each attribute is indicative of the usage of a corresponding resource ofthe system by the work unit. The method involves the selection of asubset of the work units for optimizing the usage of each resourceindividually (according to a corresponding combination of theattributes). The selected work units are then submitted.

The method of the invention provides an efficient distribution andbalancing of the workload of the system.

In this way, the usage of the different resources is strongly improved,thereby increasing the throughput of the system.

The devised solution avoids overloading specific resources of thesystem.

The method of the invention makes it possible to prevent the submissionof potentially competing jobs.

As a consequence, any resource contention (caused by the submitted jobs)is reduced.

The above-mentioned advantages can be experienced even in systemswithout any workload manager (or with a workload manager that is unableto receive the profiles from the scheduler); however, the use of theproposed solution in different environments is not excluded and withinthe scope of the present invention.

The preferred embodiment of the invention described above offers furtheradvantages.

Particularly, the attributes for each job are estimated using statisticsthat have been measured for previous executions of the job.

In this way, the behavior of the next instances of the jobs can bepredicted with a high degree of accuracy.

In a preferred implementation of the invention, the profile for each jobincludes an attribute indicative of the usage of the processing power ofthe system, another attribute indicative of the usage of the memory ofthe system and/or a further attribute indicative of the input/outputactivity of the job.

Simulation results have shown that the processing-usage attribute is themost important factor for optimizing the workload balancing of thesystem. The memory-usage attribute has proved to be very important aswell for increasing the throughput of the system. Moreover, theI/O-usage attribute further improves the performance of the proposedmethod.

However, the solution according to the present invention leads itself tobe implemented determining the profiles of the jobs in another way, andeven with some attributes that are defined by the operator.Alternatively, the scheduler supports two or more different attributes(for example, an attribute for the usage of network facilities,attributes for the usage of specific I/O peripherals, and the like).

In a particular embodiment of the invention, the selection of the jobsis based on a heuristic approach.

This approach is not optimal, but the loss of precision in thedistribution of the workload is more than compensated for by thecomputational simplicity.

As a further enhancement, the threshold condition is indicative of themaximum allowable usage of one or more resources of the system.

The proposed feature avoids an excessive contention for specificresources of the system.

Preferably, the eligible jobs are selected using the above-describedalgorithm.

This algorithm has proved to be very efficient in many practicalsituations.

A different implementation of the invention makes use of optimizationtechniques.

This solution ensures the best performance of the method (at the cost ofan increased computational complexity).

A suggested choice for the objective function to be optimized consistsof the above-described discontinuance factor.

The proposed algorithm is quite simple, but at the same time efficient.

In any case, the scheduler can use alternative algorithms when operatingeither in the heuristic mode or in the optimization mode. For example,in the heuristic mode the number of eligible jobs to be selected ispredefined, or the threshold condition is defined in another way (forexample, only taking into account the eligible jobs); on the other hand,in the optimization mode the discontinuance factor is calculated with adifferent formula, or another factor is minimized/maximized. However,the use of one or more different approaches is contemplated and withinthe scope of the invention.

Advantageously, the solution according to the present invention isimplemented with a computer program, which is provided as acorresponding product stored on a suitable medium.

Alternatively, the program is pre-loaded onto the hard-disk, is sent tothe computer through a network (typically the INTERNET), is broadcasted,or more generally is provided in any other form directly loadable into aworking memory of the computer. However, the method according to thepresent invention leads itself to be carried out with a hardwarestructure (for example, integrated in a chip of semiconductor material),or with a combination of software and hardware.

Naturally, in order to satisfy local and specific requirements, a personskilled in the art may apply to the solution described above manymodifications and alterations all of which, however, are included withinthe scope of protection of the invention as defined by the followingclaims

1-12. canceled
 12. A method of scheduling submission of work units forexecution on a data processing system, the method including the stepsof: providing a plurality of attributes for each work unit, eachattribute being indicative of the usage of a corresponding resource ofthe system by the work unit, selecting a subset of the work units foroptimizing the usage of each resource individually according to acorresponding combination of the attributes, and submitting the selectedwork units.
 13. The method according to claim 1, wherein the step ofproviding the plurality of attributes for each work unit includes:measuring a plurality of values, each one representative of acorresponding attribute, for at least one instance of the work unitexecuted previously, and estimating each attribute for the work unitaccording to the corresponding measured values.
 14. The method accordingto claim 1, wherein the plurality of attributes for each work unitincludes a first attribute indicative of a usage of a processing power,a second attribute indicative of a usage of a memory of the system,and/or a third attribute indicative of a usage of an input/outputactivity of the system.
 15. The method according to claim 1, wherein thestep of selecting the subset of the work units includes: establishing apriority order for the work units, and selecting the work units insuccession according to the priority order while a predefined conditionis met.
 16. The method according to claim 4, wherein the predefinedcondition is indicative of a maximum allowable usage of at least one ofthe resources by the work units in execution and the selected workunits.
 17. The method according to claim 4, wherein the step ofestablishing the priority order for the work units includes: creating alist for each resource, the list ordering the work units according tothe attributes corresponding to the resource, and alternately extractingeach first work unit or each last work unit from the lists.
 18. Themethod according to claim 1, wherein the step of selecting the subset ofthe work units includes: defining an objective function indicative of adistribution of the usage of the resources of the system, andidentifying a combination of the work units optimizing the objectivefunction.
 19. The method according to claim 7, wherein the step ofdefining the objective function includes, for each combination of theeligible work units: calculating a total parameter for each resource bysumming the attributes corresponding to the resource for each eligiblework unit of the combination, and calculating a discontinuance factor bysumming the absolute value of the differences between each pair of totalparameters.
 20. A computer program, directly loadable into a workingmemory of a data processing system, for performing a method ofscheduling submission of work units for execution on the data processingsystem when the program is run on the system, the method including thesteps of: providing a plurality of attributes for each work unit, eachattribute being indicative of the usage of a corresponding resource ofthe system by the work unit, selecting a subset of the work units foroptimizing the usage of each resource individually according to acorresponding combination of the attributes, and submitting the selectedwork units.
 21. A program product comprising a computer readable mediumon which a program is stored, the program being directly loadable into aworking memory of a data processing system for performing a method ofscheduling submission of work units for execution on the data processingsystem when the program is run on the system, the method including thesteps of: providing a plurality of attributes for each work unit, eachattribute being indicative of the usage of a corresponding resource ofthe system by the work unit, selecting a subset of the work units foroptimizing the usage of each resource individually according to acorresponding combination of the attributes, and submitting the selectedwork units.
 22. A workload scheduler for scheduling submission of workunits for execution on a data processing system, the scheduler includingprogram code for: providing a plurality of attributes for each workunit, each attribute being indicative of the usage of a correspondingresource of the system by the work unit, selecting a subset of the workunits for optimizing the usage of each resource individually accordingto a corresponding combination of the attributes, and submitting theselected work units.
 23. A structure for scheduling submission of workunits for execution on a data processing system, the structure includingmeans for providing a plurality of attributes for each work unit, eachattribute being indicative of the usage of a corresponding resource ofthe system by the work unit, means for selecting a subset of the workunits for optimizing the usage of each resource individually accordingto a corresponding combination of the attributes, and means forsubmitting the selected work units.
 24. A structure for schedulingsubmission of work units for execution on a data processing system, thestructure including a profile updating module for providing a pluralityof attributes for each work unit, each attribute being indicative of theusage of a corresponding resource of the system by the work unit, and anexecutor module for selecting a subset of the work units for optimizingthe usage of each resource individually according to a correspondingcombination of the attributes and for submitting the selected workunits.